﻿Discovering semanc links in texts Corpora and projects Dan Cristea dcristea@info uaic ro Faculty of Computer Science, “Alexandru Ioan Cuza” University of Iași and Instute of Computer Science, Iași Branch of the Romanian Academy Part I The QuoVadis experiment Making the implicit explicit! • The corpus and movaon • Convenons of annotaon • What do we expect from it? MDIS Sibiu, 29-30 October 2015 The ‘QuoVadis’ corpus MDIS Sibiu, 29-30 October 2015 A corpus of enes and semanc relaons • Entity types: – persons; – gods; – groups of persons and gods; – body parts of persons and gods; • Semantic relations linking these entity types MDIS Sibiu, 29-30 October 2015 Enes • individuals (Marcus Vinicius, Lygia), groups (the Christians, the soldiers) and classes (the emperor); • syntactic realisation: NPs (determiners – a soldier, adjectives – young patrician, complement PPs included – the son of one consul; but no relative clauses; • included entities for Romanian language: [te] [iubesc; REALISATION=INCLUDED], Marcus vs 12 [I] love [thee], Marcus 21 • nested referential expressions: [the adherents [of Christ]] are praying… 21 MDIS Sibiu, 29-30 October 2015 Relaons • Anaphoric relations: co-referential • Non-anaphoric relations: – kinship – affective – social MDIS Sibiu, 29-30 October 2015 Anaphoric relaons • coref • coref-interpret • member-of, has-as-member (inverse) • isa, class-of (inverse) • part-of, has-as-part (inverse) • subgroup-of, has-as-subgroup (inverse) • has-name, name-of (inverse) Example: [Lygia] was unable to answer, for weeping seized [her] 12 anew Acte gathered [the maiden] to her bosom, and 3 strove to calm [her] excitement 4 coref ; coref-interpret ; coref MDIS Sibiu, 29-30 October 2015 Kinship relaons • parent-of • child-of (inverse of parent-of) • grandparent-of and grandchild-of (inverse) • sibling (symmetrical) • ant-uncle-of, nephew-of (inverse relation) • cousin-of (symmetrical) • spouse-of (symmetrical) • unknown Example: "Pardon me, Lygia For me thou art [ [of a king]] and [ [of Plautius]] “ 2143 child-of ; child-of MDIS Sibiu, 29-30 October 2015 Social relaons • superior-of • inferior-of • in cooperation-with • colleague-of • in competition-with • opposite-to Example: [Petronius]…but to [his] misfortune [he] 123 [Cæsar himself], hence [he] roused [his] jealousy 456 in competition-with ; coref ; coref ; coref MDIS Sibiu, 29-30 October 2015 Aﬀecve relaons • love • loved-by • hate • hated by • upset • friendship • worship Example: Vinicius entered Lygia's dungeon and remained there till daylight…Both changed by degrees into sad souls with [each] [other] 12 rec-love MDIS Sibiu, 29-30 October 2015 Idenﬁcaon of arguments of a relaon • recursive (nested): – the anaphor/source is larger than the antecedent/ destination • non-recursive: – referential: • the anaphor is to the right of the antecedent – non-referential: • from source to destination reading the trigger MDIS Sibiu, 29-30 October 2015 General stascs over the corpus • 7,281 sentences • 146,822 tokens, punctuation included • 24,636 entity mentions • 22,301 referential relations • 755 AKS relations (Affective + Kinship + Social) • 752 triggers MDIS Sibiu, 29-30 October 2015 Example: aﬀecve relaons love and worship in the corpus MDIS Sibiu, 29-30 October 2015 Example: aﬀecve relaons fear-of and hate in the corpus MDIS Sibiu, 29-30 October 2015 Vinicius’ links with other characters MDIS Sibiu, 29-30 October 2015 The distribuon of semanc relaons involving the main character Vinicius MDIS Sibiu, 29-30 October 2015 ‘QuoVadis’ in the LLOD world - Developing techniques allowing to decipher the semanc content of texts - idenfying characters and linking their menons - evidencing internal relaons that would allow: - intelligent searches (for instant: what is the evoluon of senments between Vinicius and Lygia?), - stac connecons between enes (for instance, family trees), - visualise stascs about enes, etc MDIS Sibiu, 29-30 October 2015 ‘QuoVadis’ • http://nlptools infoiasi ro/Resources jsp MDIS Sibiu, 29-30 October 2015 Acknowledgements for Part I • Anca Bibiri (thanks also for many slides), Cătălina Mărănduc, Paul Diac, Daniela Gîfu, Mihaela Colhon, Andrei Scutelnicu and master students in CL MDIS Sibiu, 29-30 October 2015 Part II Linking books in the virtual and real world: MappingBooks • Origins of the project • A mapped book… • Features of the technology • The architecture of the system • Conclusions MDIS Sibiu, 29-30 October 2015 MappingBooks Get out of the book in the virtual and real world! MDIS Sibiu, 29-30 October 2015 I like to read books and to travel… MDIS Sibiu, 29-30 October 2015 I need help to remember all kinship relaons between characters MDIS Sibiu, 29-30 October 2015 Characters in Forsyte Saga • The old Forsytes Ann, the eldest of the family Old Jolyon, the patriarch of the family, having made a fortune in tea James, a solicitor, married to Emily, a most tranquil woman Swithin, James's twin brother with aristocrac pretensions; a bachelor Roger, "the original Forsyte" Julia (Juley), a ﬂuery dowager; Mrs Sepmus Small Hester, an old maid Nicholas, the wealthiest in the family Timothy, the most cauous man in England Susan, the married sister • The young Forsytes Young Jolyon, Old Jolyon's arsc and free-thinking son, married three mes Soames, James and Emily's son, an intense, unimaginave and possessive solicitor, married to the unhappy Irene, who later marries Young Jolyon Winifred, Soames's sister, one of the three daughters of James and Emily, married to the foppish and lethargic Montague Dare George, Roger's son, a dyed-in-the-wool mocker Francie, George's sister and Roger's daughter, emancipated from God • Their children June, Young Jolyon's deﬁant daughter from his ﬁrst marriage; engaged to an architect, Philip Bosinney, who becomes Irene's lover Jolly, Young Jolyon's son from his second marriage; dies of enteric fever during the Boer Wars Holly, Young Jolyon's daughter from his second marriage, to June's governess Jon, Young Jolyon's son from his third marriage, to Irene, Soames's ﬁrst wife Fleur, Soames's daughter from his second marriage, to a French Soho shopgirl Annee; Jon's lover; later marries a baronet, Michael Mont Val, Winifred and Montague's son; ﬁghts in the Boer Wars; marries his cousin Holly Imogen, Winifred and Montague's daughter • Others Parﬁ, Old Jolyon's butler Smither, Aunts Ann, Juley and Hester's housekeeper Warmson, James and Emily's butler Bilson, Soames's housemaid Prosper Profond, Winifred's admirer and Annee's lover MDIS Sibiu, 29-30 October 2015 MDIS Sibiu, 29-30 October 2015 Going out of the book… Çelebi Mh , Maç Sk, Beyoğlu, Turkey to Çukur Cuma Cd, Beyoğlu, Turkey - Google Maps10/3/1310/3/13 8:13 PMKatip Çelebi Mh , Maç Sk, Beyoğlu, Turkey to Çukur Cuma Cd, Beyoğlu, Turkey - Google Maps 8:13 PMKatip Directions to Çukur Cuma Cd, Beyo!lu, Turkey 400 m – about 4 mins Walking directions are in beta Use caution – This route may be missing sidewalks or pedestrian paths Katip Çelebi Mh , Maç Sk, Beyo!lu, Turkey" Çukur Cuma Cd, Beyo!lu, Turkey" 1 Head southwest on Maç Sk toward Baltacı Çkgo 75 m About 47 secstotal 75 m These directions are for planning purposes only You may find that construction projects, traffic, weather, or other events may cause conditions to differ from the map results, and you should plan your route accordingly You must obey all signs or notices regarding your route Map data ©2013 Basarsoft 2 Turn right onto Turnacıba"ı Cdgo 28 m total 100 m 3 Turn left onto A!a Külhanı Sk (Altıpatlar Sk )go 130 m About 2 minstotal 240 m 4 Continue onto Çukur Cuma Cdgo 150 m About 1 mintotal 400 m Page 2 of 2https://maps google com/maps?f=d&source=s d&saddr=Maç+Sokak,+I…,288 55,2 369,37 281,0&layer=c&ei=OqVNUp3mE8nTtAaWr4CgCQ&pw=2 Page 1 of 2https://maps google com/maps?f=d&source=s d&saddr=Maç+Sokak,+I…,288 55,2 369,37 281,0&layer=c&ei=OqVNUp3mE8nTtAaWr4CgCQ&pw=2 MDIS Sibiu, 29-30 October 2015 Idea • Right now: each book – so many readers… • MappingBooks: I buy a book – wow, it was specially wrien for me! MDIS Sibiu, 29-30 October 2015 Towards live books • mul-dimensional mash-ups combining textual, geographical and temporal data • spot the book menons (persons and locaons) • make heavy use of enty linking techniques => connecng enty menons onto the virtual world • links sensive to: – the context of menons in the book – the current locaon of the reader – the moment the reader iniates an access – The personality of the reader MDIS Sibiu, 29-30 October 2015 A “mapped book” • a book connected with events/locaons/ persons in the real and virtual world • the reader gets selecve informaon, depending on personal portrait (cultural and tourisc preferences, taken from social sites…) and instantaneous info (e g , locaon, as seized by the mobile/tablet) MDIS Sibiu, 29-30 October 2015 Use cases - I visit a town with a guide in my hand - places of interests are re-ranked depending on my posion - I am a high school boy, traveling by train from Sibiu to Bucharest… - if I open my tablet and posion it by the right window, it will indicate the Făgăraș peaks, as in my manual of Geography - I am in Paris for the third me… - but only now my MB Lonely Planet guide on Paris tells me about this exhibion in the Pyramid - I arrived in Cologne for the ﬁrst me and I open my tablet when I reached the Central staon… - my MB guide points me directly to the Dome MDIS Sibiu, 29-30 October 2015 Exploitaon of textual informaon in MappingBooks Aims 1) connect enes’ menons in the form of nominals (noun phrases) => one coreferenal chain corresponds to each enty; 2) no preliminary records about linked enes => the knowledge base evolves from scratch; 3) look specially for coreferenal (identy of enty menons) and geographical relaons (posion, distance, point-of, near, intersects, etc ); 4) texts under invesgaon: Geography manuals and traveling guides MDIS Sibiu, 29-30 October 2015 Enty linking • Challenges in enty linking: – name variaons – ambiguies – absence • enty • link type MDIS Sibiu, 29-30 October 2015 Enty types in MB • Type PERSON • Type LOCATION • Type ORGANISATION • Type URL • Type TIMEX MDIS Sibiu, 29-30 October 2015 Textual realisaon of enes • Syntacc realisaon: NPs (proper nouns, common nouns, adjecves, complement PPs; but NO relave clauses) • Characterised by disncve heads – [the house on the [mountain]] • If intersected è imbricated – [the museum [Grigore Anpa]] MDIS Sibiu, 29-30 October 2015 Features we want to have • The capacity to see a text diﬀerent than a string of leers – sentence spling – tokenisaon – POS-tagging – lemmasaon – NP chunking – anaphora resoluon TEXT ANALYTICS MDIS Sibiu, 29-30 October 2015 Features we want to have • Know who’s who – recognise names and types – disambiguate names – recognise an enty in the text even if menoned by a common noun or a pronoun – use an ontology of types NAME ENTITY RECOGNITION MDIS Sibiu, 29-30 October 2015 Features we want to have • What virtual world enes are menoned in the book? – link textual menons of enes in the virtual world – decide what virtual info would be relevant to user – employ mulple sources ENTITY CROWLING MDIS Sibiu, 29-30 October 2015 Features we want to have • Fetch, process and make use of geo-data – Geographic Informaon Systems (GIS) – geographic layers GEOGRAPHY MDIS Sibiu, 29-30 October 2015 Features we want to have • Trace on a map a spaal relaon described in the book – spaal relaons detecon in text – use Google Maps-like geo-strata (actually we procured our own maps) – trace locaons and paths on maps RELATIONS DETECTION MAPS&TRAJECTORIES MDIS Sibiu, 29-30 October 2015 Features we want to have • Know where I am • What real world enes are in my proximity – detecon of my posion – computaon of distances from the menoned places – signalling “interesng” locaons in proximity DEVICE INFO MDIS Sibiu, 29-30 October 2015 Features we want to have • Mix images with generated info – locate the posion of the user (GPS) – Sense the orientaon of the camera (compass) – process images => segment, contours, recognion – decide info to be displayed AUGMENTED REALITY MDIS Sibiu, 29-30 October 2015 Features we want to have • Aracve user interfaces – analyse use cases – design dedicated user interfaces – accommodate on the screen a segment of text, a map, user’s posion, web info, etc INTERFACES MDIS Sibiu, 29-30 October 2015 Features we want to have • Client-server – user’s Portrait – the databases – standards and communicaon protocols CLIENT-SERVER MDIS Sibiu, 29-30 October 2015 Other issues… • RESOURCES – ﬁnd the texts – clear IPR – perform annotaon – ﬁnd other relevant linguisc data MDIS Sibiu, 29-30 October 2015 TA = Text Analytics NER = Name Entity Recognition AR = Augmented Reality EC = Entity Crowling DEV = Device Info RD = Relations Detection INT = Interfaces GEO = Geography RES = Resources M&T = Maps and Trajectories M&E = Management and Evaluation MDIS Sibiu, 29-30 October 2015 MappingBooks are addressed to… • School children – to put books back in their hands (lost paradise), by boosng interacvity based on readings • Adolescents, adventures, travellers, randonée people (montagnards) – to socialize on common travels • Pensioners – to network based on common readings, cultural preferences • Researchers on LT & Computaonal Linguiscs – to get access to heavily annotated linguisc resources • Providers of textual data (eding houses, media companies, newspapers) – to beer sell their books • Local administraons and tourist agencies – to adverse places described in famous books • If in extensive use, MB could enhance the European common repository of language resources MDIS Sibiu, 29-30 October 2015 Acknowledgements for Part II • MappingBools is a project supported by a grant of the Romanian Ministry of Educaon and Research, July 2014 – June 2016 • Our students in Computer Science, for developing a prototype of the system during their project in AI, in the Autumn – Winter term of 2013-2014… • My colleagues: Ionuț Pistol, Daniela Gîfu, Daniel Anechitei, prof Mihai Niculiță in the Dept of Geography MDIS Sibiu, 29-30 October 2015 Thank you! 49 MDIS Sibiu, 29-30 October 2015 